Weighted Fuzzy-Possibilistic C-Means Over Large Data Sets

نویسندگان

Renxia Wan

Yuelin Gao

Caixia Li

چکیده

Up to now, several algorithms for clustering large data sets have been presented. Most clustering approaches for data sets are the crisp ones, which cannot be well suitable to the fuzzy case. In this paper, the authors explore a single pass approach to fuzzy possibilistic clustering over large data set. The basic idea of the proposed approach (weighted fuzzy-possibilistic c-means, WFPCM) is to use a modified possibilistic c-means (PCM) algorithm to cluster the weighted data points and centroids with one data segment as a unit. Experimental results on both synthetic and real data sets show that WFPCM can save significant memory usage when comparing with the fuzzy c-means (FCM) algorithm and the possibilistic c-means (PCM) algorithm. Furthermore, the proposed algorithm is of an excellent immunity to noise and can avoid splitting or merging the exact clusters into some inaccurate clusters, and ensures the integrity and purity of the natural classes. DOI: 10.4018/jdwm.2012100104 International Journal of Data Warehousing and Mining, 8(4), 82-107, October-December 2012 83 Copyright © 2012, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. samples by a certain rule such as chisquare or divergence hypothesis (Hathaway et al., 2006). The incremental approaches (Bradley et al., 1998; Farnstrom et al., 2000; Gupta et al., 2004; Karkkainen et al., 2007; Luhr et al., 2009; Nguyen-Hoang et al., 2009; Ning et al., 2009; O’Callaghan et al., 2002; Ramakrishnan et al., 1996; Siddiqui et al., 2009; Wan et al., 2010, 2011) generally maintain past knowledge from the previous runs of a clustering algorithm to produce or improve the future clustering model. Nevertheless, as Hore et al. (2007) pointed out, many existing algorithms for large and very large data sets are used for the crisp case, rarely for the fuzzy case. This is because fuzzy cluster needs to perform repeatedly the clustering iterations until the optimal solution or the acceptable approximate optimal solution is gained, and scan repeatedly the data set. This may greatly conflicts with the requirement of processing algorithm for large data set. Kwok, Smith, Lozano, and Taniar (2002) clustered insurance data set with an parallel fuzzy c-means (PFCM) clustering method. Hore, Hall, and Goldgof (2007) presented a single pass fuzzy c-means algorithm (SP) for clustering large data set, since FCM has innate sensitive dependence on noises, while in large data set, noises usually are unavoidable, and thus PFCM and SP have considerable trouble in noisy environments. The possibilistic clustering algorithm was presented by Krishnapuram and Keller (1993, 1996). In this clustering structure, each cluster is disentangled from the others and the membership values are represented as the typicality of the point to the class prototypes. Actually, the possibilistic clustering algorithm leads to higher noise immunity with respect to classical algorithms derived from Bezdek’s fuzzy c-means (FCM) (Bezdek, 1981), but it is sensitive to the initialization (Xie et al., 2008). In this paper, we propose a weighted fuzzypossibilistic c-means (WFPCM) algorithm for large or very large data sets. WFPCM can produce an excellent clustering result in a single pass through the data sets with finite memory allocated. The rest of this paper is organized as follows. Section “Fuzzy c-means (FCM) algorithm” surveys FCM algorithm. Section “Possibilistic c-means (PCM) algorithm” does the same for PCM. In section “Weighted fuzzypossibilistic c-means (WFPCM) algorithm,” we present the new algorithm. The experimental results on both synthetic and real data sets are reported in section “Empirical results and evaluation.” Finally, we make our conclusions in section “Conclusions.” FUZZY C-MEANS (FCM) ALGORITHM The task of fuzzy c-means (FCM) algorithm (Bezdek, 1981) is to minimize the objective function J U v u d m ij m ij j n

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bilateral Weighted Fuzzy C-Means Clustering

Nowadays, the Fuzzy C-Means method has become one of the most popular clustering methods based on minimization of a criterion function. However, the performance of this clustering algorithm may be significantly degraded in the presence of noise. This paper presents a robust clustering algorithm called Bilateral Weighted Fuzzy CMeans (BWFCM). We used a new objective function that uses some k...

متن کامل

Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering: Application to Medical Image MRI

The analysis and processing of large data are a challenge for researchers. Several approaches have been used to model these complex data, and they are based on some mathematical theories: fuzzy, probabilistic, possibilistic, and evidence theories. In this work, we propose a new unsupervised classification approach that combines the fuzzy and possibilistic theories; our purpose is to overcome th...

متن کامل

Weighted possibilistic moments of fuzzy numbers with applications to GARCH modeling and option pricing

Carlsson and Fuller [C. Carlsson, R. Fuller, On possibilistic mean value and variance of fuzzy numbers, Fuzzy Sets and Systems 122 (2001) 315–326] have introduced possibilisticmean, variance and covariance of fuzzy numbers and Fuller andMajlender [R. Fuller, P. Majlender, On weighted possibilistic mean and variance of fuzzy numbers, Fuzzy Sets and Systems 136 (2003) 363–374] have introduced the...

متن کامل

Hausdorff Distance Measure Based Interval Fuzzy Possibilistic C-Means Clustering Algorithm

Clustering algorithms have been widely used artificial intelligence, data mining and machine learning, etc. It is unsupervised classification and is divided into groups according to data sets. That is, the data sets of similarity partition belong to the same group; otherwise data sets divide other groups in the clustering algorithms. In general, to analysis interval data needs Type II fuzzy log...

متن کامل

On weighted possibilistic mean and variance of fuzzy numbers

Dubois and Prade defined an interval-valued expectation of fuzzy numbers, viewing them as consonant random sets. Carlsson and Fullér defined an interval-valued mean value of fuzzy numbers, viewing them as possibility distributions. In this paper we shall introduce the notation of weighted interval-valued possibilistic mean value of fuzzy numbers and investigate its relationship to the interval-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

IJDWM

دوره 8 شماره

صفحات -

تاریخ انتشار 2012

Weighted Fuzzy-Possibilistic C-Means Over Large Data Sets

نویسندگان

چکیده

منابع مشابه

Bilateral Weighted Fuzzy C-Means Clustering

Unsupervised Approach Data Analysis Based on Fuzzy Possibilistic Clustering: Application to Medical Image MRI

Weighted possibilistic moments of fuzzy numbers with applications to GARCH modeling and option pricing

Hausdorff Distance Measure Based Interval Fuzzy Possibilistic C-Means Clustering Algorithm

On weighted possibilistic mean and variance of fuzzy numbers

عنوان ژورنال:

اشتراک گذاری